Cyclization and Docking Protocol for Cyclic Peptide–Protein Modeling Using HADDOCK2.4

An emerging class of therapeutic molecules are cyclic peptides with over 40 cyclic peptide drugs currently in clinical use. Their mode of action is, however, not fully understood, impeding rational drug design. Computational techniques could positively impact their design, but modeling them and their interactions remains challenging due to their cyclic nature and their flexibility. This study presents a step-by-step protocol for generating cyclic peptide conformations and docking them to their protein target using HADDOCK2.4. A dataset of 30 cyclic peptide–protein complexes was used to optimize both cyclization and docking protocols. It supports peptides cyclized via an N- and C-terminus peptide bond and/or a disulfide bond. An ensemble of cyclic peptide conformations is then used in HADDOCK to dock them onto their target protein using knowledge of the binding site on the protein side to drive the modeling. The presented protocol predicts at least one acceptable model according to the critical assessment of prediction of interaction criteria for each complex of the dataset when the top 10 HADDOCK-ranked single structures are considered (100% success rate top 10) both in the bound and unbound docking scenarios. Moreover, its performance in both bound and fully unbound docking is similar to the state-of-the-art software in the field, Autodock CrankPep. The presented cyclization and docking protocol should make HADDOCK a valuable tool for rational cyclic peptide-based drug design and high-throughput screening.


Supplementary Methods
Another metric typically used for interface accuracy is the interface root-mean square deviation (i-RMSD), which is calculated on backbone atoms of both protein and peptide residues that are within a 10 Å radius from each other in the reference complex. However, i-RMSD fails to correctly evaluate acceptable cyclic peptide solutions as illustrated in Figure S1 in which the depicted model has the correct interface while only the flexible non-interacting peptide regions deviate from the crystal structure. This predicted complex scored high for fnat (0.86) with a i-RMSD higher than the acceptable cutoff (2.43Å), demonstrating that the i-RMSD metric can be misleading for the quality of cyclic peptide binding modes. This is because i-RMSD focuses on the backbone conformation of the peptide, while the crucial determinant for cyclic peptide interactions are the correct contacts, mainly obtained via side chains. The fnat metric is, therefore, ideal for reporting on the correct contacts and this is why it was chosen for the analysis in this study as well as in other studies concerning cyclic peptides 25 . Table S1. The Backbone dataset including 18 complexes composed of a receptor and a peptide cyclized through its N-and C-terminus. The length of the peptides varies from 6 to 14 residues and the corresponding apo structure of the receptor was found for all but one complex. The sequence of the peptides is shown with the additional disulfide bond highlighted in black in the respective cases, the termini are highlighted in boldface. a. Absence of the apo receptor structure is indicated with a dash ("-") b. Highlighted in black is the additional disulfide bond found in the peptides of the respective complexes. In black bold face the connection of the termini is represented. Table S2. Disulfide dataset including 12 complexes composed of a receptor and a peptide cyclized through a single disulfide bond. The length of the peptides varies from 6-14 residues and the corresponding apo structure of the receptor was found for 8 out of 12 complexes. As shown in the sequence of each peptide by a black highlight, only one disulfide bond is present in each peptide that cyclizes the sequence. a. Absence of the apo receptor structure is indicated with a dash ("-") b. Highlighted in black is the disulfide bond that cyclizes the peptide. Table S3. Parameters changed in the run.cns file for Step 2 of the cyclization protocol.

PDB ID
The indicated values are the optimized settings for Step 2 of cyclization.
Highlighted in black are the additional settings which are unique for Step 3 and not found in Step 2. The indicated values are the optimized settings for Step 3 of cyclization.

Description Keyword Changes
Minimum distance between C-and Ntermini to create hydrogen bond cyclicpept_dist=3.5 Minimum distance between cysteines to create a disulfide bond disulphide_dist=4.0 the number of peptides within a given (high, medium and acceptable) quality. Peptides were defined as high quality if their RMSD (in relation to the x-ray structure) was lower than 1.5 Å, medium if RMSD lower than 2.0 Å and acceptable if their RMSD was lower than 2.5 Å.
Complexes are sorted according to peptide's sequence length (from short to long peptides).

Structure ensemble
Quality PDB ID 3wne 3zgc 3av9 3ava 3avb 3avf 3avg 3avh 3avi 3avj 3avk 3avm 3avn 5xn3 4kel 1sfi 3p8f 4k1e Average the number of peptides within a given (high, medium and acceptable) quality. Peptides were defined as high quality if their RMSD (in relation to the x-ray structure) was lower than 1.5 Å, medium if RMSD lower than 2.0 Å and acceptable if their RMSD was lower than 2.5 Å.
Complexes are sorted according to peptide's sequence length (from short to long peptides) and residues that are not part of the cyclic region have been excluded from the RMSD calculation.   Table S11. Analysis of 50STR_COMB generated models. Reported are the rank of the first acceptable model (according to fnat and CAPRI criteria) and its fnat and i-RMSD values respectively. Also, the rank of the best fnat value is reported along with the i-RMSD and fnat value for each specific model. Finally, reported is the total number of high, medium and acceptable models in every complex docking run according to their fnat value and the CAPRI criteria. Complexes in the table are sorted according to the sequence length of their peptide (from short to long).  Figure S1. Scatter plot of i-RMSD and fnat of top20 models for all complexes of the Backbone dataset, which were generated using the same protocol. 360 data points were used to create the scatter plot. Illustrated in the circle is 3av9 complex model that scored high fnat (0.86) but also high i-RMSD (2.43).

50STR_COMB
HADDOCK model X-ray structure Figure S3. Fnat success rate (%) plots of experiments using a 60-structures ensemble and generating 5.000 or 10.000 models in it0. Plotted are the success rates of the Backbone dataset (n=18 complexes). Color coding (from blue to green) indicates model quality (acceptable to high), according to CAPRI criteria. Figure S4. Fnat success-rate plots of 50STR_FLEX, 50STR_SOLVSHELL and 50STR docking protocols on the Backbone dataset. Plotted are the success rates of all Backbone complexes for each tested protocol, the color coding (from blue to green) indicates the quality of the models (from acceptable to high) according to CAPRI criteria.

50STR_FLEX 50STR_SOLVSHELL 50STR
Backbone dataset -all peptides 50STR_FLEX_SOLVSHELL Figure S5. Comparison of performance of 50STR, 50STR_FLEX, 50STR_SOLVSHELL and 50STR_FLEX_SOLVSHELL protocols. Each column corresponds to one complex of the Backbone dataset and the y-axis reflects the ranking of the models according to the itw HADDOCK score in every protocol. The color of each model indicates its quality (from acceptable to high) according to the CAPRI criteria. Complexes are sorted in the x-axis according to their sequence length (from shorter to longer),  Figure S6. Comparison of performance of 50STR_COMB experiment in the Backbone and Disulfide dataset. Each column corresponds to one complex of the dataset and the y-axis reflects the ranking of the models according to the itw HADDOCK score in the protocol. Models are colored according to their fnat score (A) or according to their i-RMSD (B). The color of each model indicates its quality (from acceptable to high) according to the CAPRI criteria. Complexes are sorted in the x-axis according to their sequence length (from shorter to longer).  Figure S7. Difference in fnat between models from various stages of HADDOCK (it0/it1/itw) for the 50STR_COMB experiment. The distributions are calculated from all generated models of the 50STR_COMB experiment using the holo structure of the receptor (30 complexes*400 models/complex = 12.000 data-points). (A) The impact of semi-flexible refinement in torsion angle space (it1-it0) is shown. Bin size used for the plot: 35 (B) The impact of final water refinement (or the simple energy minimization step, if short peptides are considered) is shown. Bin size used for the plot is 10.

A B
Short peptides (≤10 residues) Short peptides (≤10 residues) Long peptides (>10 residues) Long peptides (>10 residues) Figure S8. Fnat success-rate (%) plots of 50STR_COMB docking for short (≤10 residues) and long (>10 residues) peptides separately in the Backbone and Disulfide dataset. (A) Plotted are the success-rates of the Backbone short peptides (n=14) and the Disulfide short peptides (n=3). (B) Plotted are the success-rates of the Backbone long peptides (n=4) and the Disulfide long peptides (n=9). Color coding (from blue to green) indicates the quality of models (from acceptable to high) according to CAPRI criteria.

Backbone dataset Disulfide dataset
A B Short peptides (≤10 residues) Short peptides (≤10 residues) Long peptides (>10 residues) Long peptides (>10 residues) Figure S9. Fnat success-rate plots of best-case scenario (bound docking) for the Backbone and the Disulfide dataset. Plotted are the success-rates of docking the holo cyclic peptide structure with the holo receptor structure for both datasets. Color coding (from blue to green) indicates model quality (from acceptable to high), according to CAPRI criteria.
Holo receptor structure

Disulfide dataset
Holo receptor structure Backbone dataset Figure S10. Fnat success-rate plots of best-case scenario (bound docking) for the Backbone and the Disulfide dataset. (A) Plotted are the success-rates of short peptides using the holo receptor structure (B) Plotted are the success rates of long peptides using the holo receptor structure. Color coding (from blue to green) indicates model quality (from acceptable to high), according to CAPRI criteria.